home *** CD-ROM | disk | FTP | other *** search
- Path: news.NetVision.net.il!news
- From: Jack <avilev@netvision.net.il>
- Newsgroups: comp.sys.amiga.programmer
- Subject: Re: 680X0 -> PPC translator?
- Date: Wed, 03 Apr 1996 07:23:56 -0700
- Organization: NetVision LTD.
- Message-ID: <3162980C.2003@netvision.net.il>
- References: <31499F8E.26A9@netvision.net.il> <volker.0fw1@vb.franken.de> <315800D7.1854@sapiens.com> <volker.0g32@vb.franken.de> <315C198B.49C2@netvision.net.il> <volker.0g5w@vb.franken.de>
- NNTP-Posting-Host: ts006p2.pop4a.netvision.net.il
- Mime-Version: 1.0
- Content-Type: text/plain; charset=us-ascii
- Content-Transfer-Encoding: 7bit
- X-Mailer: Mozilla 2.0b6a (Win16; I)
-
- Volker Barthelmann wrote:
- >
- > Jack (avilev@netvision.net.il) wrote:
- > : Hi Volker, well i'll try to keep the lines shorter this time
- > : so you'll be able to read the text more conveniently and who
- >
- > Thanks!
- >
- > : know. you might actually be convinced that 680x0 -> PPC is possible. ;-)
- >
- > I doubt that. :-)
- >
- > : > Perhaps You can perform this analysis. An algorithm can't.
- > :
- > : why the hell not, if i can understand assembly source code,
- > : the machine can understand machine code right?!
- >
- > I don't know if You (or any human) can really 'understand' EVERY
- > piece of machine code. Of course typical assembly source can be
- > understood by humans, because it was written with that in mind.
- > you don't have to understand what a program does in order to convert
- its code from one instruction set to another. all you look for are
- specific things and provided you know the op-codes of the 2 processors
- there should be no problem whatsoever.
-
- > : > Of course You can have a structure holding all Your variables. Now You pass
- > : > the address of this structure to an external function that writes some
- > : > values into it and You lost.
- > :
- > : i don't seem to care about that, get it through your head, i'm NOT
- > : going to intervene with what the program does with any memory area,
- > : just as long as this memory area is not later being used for code
- >
- > Unfortunately You have to know about all other memory areas as well,
- > because otherwise You cannot determine what is code and what is data.
-
- not necessarily, all static code is inside the program already, all there's
- to do is follow the logic of the program keeping track of what memory areas
- are being used AND how. now don't be confused again, you don't need actual
- run-time pointer values in order to know how a memory is being used, you use
- its 'symbol' ie pointer variable to represent it. if you follow though the
- entire program, never mind its logic, you can know which parts (within the exe)
- are code or data, any part which is jmp'ed to is code otherwise it's data.
-
- >
- > Think about: The program writes a value somwhere. Then copies it around,
- > shifts it, moves it again etc. and than sometimes it reads it from where
- > it is now, loads it in a0 and does a jmp (a0) or so.
-
- oh god, how many times will i have to go through this. it doesn't matter
- at all because i don't care if that area is jmp'ed directly or indirectly
- if i find out that some memory is used as a code area, i go back and trace
- where it was initially assigned and then start following the changes made to
- it, making sure to change the size of the area it points to so that the PPC
- code will fit nicely. really it's that simple and if you don't believe me
- then i'll make my point more concrete for ya, try looking at some assembly
- source which does what you just said, YOU can know (if you follow the flow)
- the exact area a certain pointer is pointing at, at a given point in the
- program, you can also figure out how it's being used and you can also
- figure out what needs to be changed for the PPC translation to work. if you
- can do it, bet an algorithm can too.
-
- >
- > : execution. then, and only then will i have to turn to all the locations
- > : where that pointer could have been assigned and then change the size
- > : argument which i already know to the appropriate sizes, once i figure
- > : out that the area is code, i will perform a well defined series of actions
- > : to resolve ALL dependecies related to that area, including size, write loops etc.
- >
- > Well, please define those 'well defined' series of actions.
-
- as soon as i find out the pointer is a CODE pointer i do the following:
-
- 1) trace back in the program where that pointer was assigned.
- 2) decide whether the area it's pointing to is static (meaning within one of
- the program's segments) or a dynamic one.
- 3) if (2) == dynamic then calculate 'source' code size and look it up withing
- one of the arguments just before the call/jmp is made. the argument will
- be an immediate or stored in some variable, the point is it's INSIDE the
- one of the program's segments and it's a REAL VALUE.
- change the size according to the translated code size. (i'm assuming the code
- has already been chewed up and spitted out)
- 4) is (2) == static then increase the size of the hunk it's located in.
- 5) find the last write loop just before the call was made and change the counter's
- end value condition, changing the move instruction to move bytes.
- 6) go on happily to other parts of the program.
-
- HUH, i wrote it, lets see your response to that Volker. flame me good this
- time, ok? :-)
-
- >
- > : > There are several memory allocating functions or other functions You can't
- > : > know anything about.
- > :
- > : then you take into account all of the various SYSTEM functions which allocate
- > : memory (there aren't many of them) and proceed as normal.
- >
- > What do You do if a program does an OpenLibrary("some_custom.library",foo);
- > and a jsr -some_strange_offset_I've_never_seen_before(a6)?
- > This function could call AllocMem. You don't even know what parameters it
- > takes etc.
-
- AHHHHH, that is where you're wrong (again, teehee), here's that phrase again,
- 'keep track', i know i know, this term is without a doubt overused in my articles,
- but hey, this is WAR, any means can be used to achieve the target, PPC dominance.
- now, if you save your stack status before every call is made, you can know which
- parameters on the stack belong to that function. for example if the stack
- contained: A,B,C before and now D,E,F are pushed and then there's a call to
- some routine then you know D,E,F are its arguments don't you.
- bear in mind that the external library is PPC translated already, any
- code-modification tricks it does won't need any changes.
- ofcourse, __regarg functions (in C) don't use the stack for all parameters
- i know, and some assembly programs like to do register passing of arguments
- well, then in that case it's truely more diffilcult but not impossible.
- if the call is to some C RTL function, then no sweat, i can go directly to
- the function and actually see how it uses the register in its code and
- dicsern the call prototype and make the necessary adjustments. if it's a
- DOS library things might get a little complicated, but could be solve by
- some educated guess as to what needs to be changed.
-
- >
- > : > You would have to adjust EVERYTHING that is in any way dependant on the code
- > : > size! How are You going to do this?
- > :
- > : all you have to look for is where it was allocated and where it is copied,
- > : assuming you're copying code that is. quite easily done if you follow the
- > : change made to that pointer while 'running' through the program's logic.
- >
- > If You write a program that only reads a normal assembly source file that
- > is known to copy some piece of code somewhere and Your program can change the
- > source to copy one byte more, I'll believe You (and call You god, if You want).
-
- well, if i'll have the time and energy i will, but 1st i have to write the analyser
- program to do that. that might take some time.
-
- > Please tell me more, I'm curious!
-
- well, the project analysed the information flow in some organization and tried
- to analyse the bottle-necks in that flow which ultimatly caused financial losses
- to that company. i won't go into many details of how it's done cuz you'll have to
- have background in Automated-data-processing theory in order to comprehand the
- terms used and what i'm talking about. suffice to say that by the end, the project
- involved an ingenius highly automated and controlled network of computers running
- software which dealt specifically with the problems at hand, which increased
- productivity in the short term and profits in the longer term.
-
- > Yes, You have. E.g. the allocator could assume that all requests are multiples
- > of 1024 and rely on that. Now the original code may have been a multiple
- > of 1024, but the PPC-code probably isn't. So when You pass the PPC-code-size
- > to the allocator it will go nuts.
-
- so what??? the area will later be used in some loop copying the code, right?! it'll
- have to use actuall byte/word count to do that. you look into the allocation 'prototype'
- and try to find out that value. if you can't find it, then you're probably right, it
- uses some multiple of some size, in that case the minimum multiple it would have to be
- is calculated by unit = actual_source_size/num_units_requested;
- then you use that unit size in the calculation of the PPC code size and you're done.
- of course there'll be some memory wastes but who cares about that.
-
- >
- > : granted, you're right about that not all solutions are clear cut but when it comes
- > : to exact sciences such as computer-science then i would disagree, especially
- > : since the problem we're dealing with isn't related to any human logic, that
- > : field even in computer-science hasn't been fully explored and understood yet.
- > : no, the only major problem is self-modification of the program otherwise you
- > : translate the code 1:1.
- >
- > Still I claim You can't even reliably decide what is code and what is data.
- > Even if self-modifying code is forbidden.
-
- that's absurd, if a human can follow an assembly source code and can know which
- parts are code and which ones are data so can a fucken algorithm.
-
- > Assume a program that has some kind of keyfiles. It has n areas that could
- > be code or data and Your algorithm has to decide that.
- > Assume that the addresses of those areas are in an array adr[n].
- > Now the program call the system function Open("env:keyfile",MODE_OLDFILE),
- > reads all longwords from the file, adds them up and calls adr[sum%n].
- > To know which areas could be called You would have to know all valid
- > keyfiles. Of course no algorithm knows them and therefore can't decide what
- > is code. qed.
-
- if that's your example of a real value, think AGAIN. this can easily be solved
- by going through all the various hunks of the program trying to find CODE sections.
- when you find a code instruction that 'makes sense' mark the section's entry point.
- if somebody else calls it you know it's code for sure, then you translate it.
- you go through all such code sections and understand when and where they're being
- used making sure to translate only code sections actually being used. if a sequence
- of data words reveals code it might actually not be code so you have to defer the
- translation until you're sure about it. chances are it's code, if more than 2-3
- instructions exists in sequence that probably is code and can thus be translated.
- if it was just random data, then nothing happened, we just randomised it again.
- better think harder, now flame me.
-
- Avi Lev.
-